Main solution

Getting Started

Installing and loading the required libraries

Note: Ensure that the pacman package has already been installed.

The following R packages will be used:

  • tidytext

  • tidyverse

  • readtext

  • quanteda

  • jsonlite

  • igraph

  • tidygraph

  • ggraph

  • visNetwork

  • clock

  • graphlayouts

  • plotly

  • ggiraph

pacman::p_load(tidytext, readtext, quanteda, tidyverse, jsonlite, igraph, tidygraph, ggraph, visNetwork, clock, graphlayouts,plotly,ggiraph)

Importing JSON File

Direct import of the mc3.json file shows an error message indicating that there’s an invalid character in the JSON text, specifically “NaN”. As “NaN” is not recognised as a valid value, preprocessing of the JSON file to replace “NaN” is required.

In the code chunk below, mc3.json is first imported, then all instances of “NaN” are replaced with “null”, and the processed file is written into a json file mc3_fixed.json for later use.

# Read the JSON file as text
json_text <- readLines("data/mc3.json")
Warning in readLines("data/mc3.json"): incomplete final line found on
'data/mc3.json'
# Replace "NaN" with "null"
json_text_fixed <- gsub("NaN", "null", json_text)

# Write the fixed JSON text back to a file
writeLines(json_text_fixed, "data/mc3_fixed.json")

Importing preprocessed mc3_fixed.json file

mc3_data <- fromJSON("data/mc3_fixed.json")

Check dataframe

  • Opens new tabs within R workspace, not shown in website

  • Example of the view is shown in the screenshot tab below

view(mc3_data[["nodes"]])
view(mc3_data[["links"]])

mc3_data[[“nodes’]

mc3_data[[“links”]]

View dataframe

  • Similar info as shown above
glimpse(mc3_data)
List of 5
 $ directed  : logi TRUE
 $ multigraph: logi TRUE
 $ graph     : Named list()
 $ nodes     :'data.frame': 60520 obs. of  15 variables:
  ..$ type             : chr [1:60520] "Entity.Organization.Company" "Entity.Organization.Company" "Entity.Organization.Company" "Entity.Organization.Company" ...
  ..$ country          : chr [1:60520] "Uziland" "Mawalara" "Uzifrica" "Islavaragon" ...
  ..$ ProductServices  : chr [1:60520] "Unknown" "Furniture and home accessories" "Food products" "Unknown" ...
  ..$ PointOfContact   : chr [1:60520] "Rebecca Lewis" "Michael Lopez" "Steven Robertson" "Anthony Wyatt" ...
  ..$ HeadOfOrg        : chr [1:60520] "Émilie-Susan Benoit" "Honoré Lemoine" "Jules Labbé" "Dr. Víctor Hurtado" ...
  ..$ founding_date    : chr [1:60520] "1954-04-24T00:00:00" "2009-06-12T00:00:00" "2029-12-15T00:00:00" "1972-02-16T00:00:00" ...
  ..$ revenue          : num [1:60520] 5995 71767 0 0 4747 ...
  ..$ TradeDescription : chr [1:60520] "Unknown" "Abbott-Gomez is a leading manufacturer and supplier of high-quality furniture and home accessories, catering to"| __truncated__ "Abbott-Harrison is a leading manufacturer of high-quality food products, including baked goods, snacks, and bev"| __truncated__ "Unknown" ...
  ..$ _last_edited_by  : chr [1:60520] "Pelagia Alethea Mordoch" "Pelagia Alethea Mordoch" "Pelagia Alethea Mordoch" "Pelagia Alethea Mordoch" ...
  ..$ _last_edited_date: chr [1:60520] "2035-01-01T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" ...
  ..$ _date_added      : chr [1:60520] "2035-01-01T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" ...
  ..$ _raw_source      : chr [1:60520] "Existing Corporate Structure Data" "Existing Corporate Structure Data" "Existing Corporate Structure Data" "Existing Corporate Structure Data" ...
  ..$ _algorithm       : chr [1:60520] "Automatic Import" "Automatic Import" "Automatic Import" "Automatic Import" ...
  ..$ id               : chr [1:60520] "Abbott, Mcbride and Edwards" "Abbott-Gomez" "Abbott-Harrison" "Abbott-Ibarra" ...
  ..$ dob              : chr [1:60520] NA NA NA NA ...
 $ links     :'data.frame': 75817 obs. of  11 variables:
  ..$ start_date       : chr [1:75817] "2016-10-29T00:00:00" "2035-06-03T00:00:00" "2028-11-20T00:00:00" "2024-09-04T00:00:00" ...
  ..$ type             : chr [1:75817] "Event.Owns.Shareholdership" "Event.Owns.Shareholdership" "Event.Owns.Shareholdership" "Event.Owns.Shareholdership" ...
  ..$ _last_edited_by  : chr [1:75817] "Pelagia Alethea Mordoch" "Niklaus Oberon" "Pelagia Alethea Mordoch" "Pelagia Alethea Mordoch" ...
  ..$ _last_edited_date: chr [1:75817] "2035-01-01T00:00:00" "2035-07-15T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" ...
  ..$ _date_added      : chr [1:75817] "2035-01-01T00:00:00" "2035-07-15T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" ...
  ..$ _raw_source      : chr [1:75817] "Existing Corporate Structure Data" "Oceanus Corporations Monthly - Jun '35" "Existing Corporate Structure Data" "Existing Corporate Structure Data" ...
  ..$ _algorithm       : chr [1:75817] "Automatic Import" "Manual Entry" "Automatic Import" "Automatic Import" ...
  ..$ source           : chr [1:75817] "Avery Inc" "Berger-Hayes" "Bowers Group" "Bowman-Howe" ...
  ..$ target           : chr [1:75817] "Allen, Nichols and Thompson" "Jensen, Morris and Downs" "Barnett Inc" "Bennett Ltd" ...
  ..$ key              : int [1:75817] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ end_date         : chr [1:75817] NA NA NA NA ...
Note

mc3_date[[“nodes”]] dataframe contains 15 columns and 60520 rows.

mc3_date[[“links”]] dataframe contains 11 columns and 75817 rows.

Note

On closer inspection of mc3_data, we note some issues to be rectified:

  1. Columns containing dates are treated as “Character” data type instead of date data type, which is incorrect. Thus, the data type of the following fields need to be changed to “Date”” data type:
    • founding_date
    • _last_edited_date
    • _date_added
    • start_date
    • _last_edited_date
    • _date_added
    • dob
  2. Some columns have missing values, which need to be handled appropriately for ease of later analysis.
  3. Some columns are prefixed with “_”, we remove them to reduce chance of bugs later

Missing Values

Identify the percentage of missing values within the dataset

# Function to calculate missing value percentages
calculate_missing_percentage <- function(df) {
  total_values <- nrow(df) * ncol(df)
  missing_values <- sum(is.na(df))
  missing_percentage <- (missing_values / total_values) * 100
  return(missing_percentage)
}
nodes_missing_percentage <- calculate_missing_percentage(mc3_data[["nodes"]])
nodes_missing_percentage
[1] 35.11952
nodes_missing_by_column <- sapply(mc3_data[["nodes"]], function(x) sum(is.na(x)) / length(x) * 100)
nodes_missing_by_column
             type           country   ProductServices    PointOfContact 
          0.00000           0.00000          85.34204          85.38334 
        HeadOfOrg     founding_date           revenue  TradeDescription 
         85.35691          85.34204          85.36847          85.34204 
  _last_edited_by _last_edited_date       _date_added       _raw_source 
          0.00000           0.00000           0.00000           0.00000 
       _algorithm                id               dob 
          0.00000           0.00000          14.65796 
links_missing_percentage <- calculate_missing_percentage(mc3_data[["links"]])
links_missing_percentage
[1] 9.059973
links_missing_by_column <- sapply(mc3_data[["links"]], function(x) sum(is.na(x)) / length(x) * 100)
links_missing_by_column
       start_date              type   _last_edited_by _last_edited_date 
        0.1187069         0.0000000         0.0000000         0.0000000 
      _date_added       _raw_source        _algorithm            source 
        0.0000000         0.0000000         0.0000000         0.0000000 
           target               key          end_date 
        0.0000000         0.0000000        99.5410000 
Note

Nodes: Overall, there are 35.12% missing values. While most columns have no missing values, the majority of those with missing data pertain to optional attributes:

  • ProductServices (Optional) - 85.34%

  • PointOfContact (Optional)- 85.38%

  • HeadofOrg (Optional) - 85.36%

  • founding_date - 85.34%

  • revenue (Optional) - 85.37%

  • TradeDescription (Optional) - 85.34%

  • dob - 14.66%

Links: Overall, there are 9.06% missing values. Most of the columns do not contain missing values, except for:

  • start_date - 0.12%

  • end_date (Optional) - 99.54%

In addition, according to the VAST2024 - MC3 Data Description file, all empty values in the revenue column are supposed to have been set to 0. However, there are still some values with “NA”.

Setting empty values in revenue to 0

Set NA values to 0 to aid analysis

# Create a copy of mc3_data
mc3_data2 <- mc3_data

# Set empty values in revenue to 0 and save it to the new list
mc3_data2$nodes$revenue <- ifelse(is.na(mc3_data2$nodes$revenue) | mc3_data2$nodes$revenue == "", 0, mc3_data2$nodes$revenue)

Verify changes

# ensure no more missing values in revenue column
sum(is.na(mc3_data2$nodes$revenue))
[1] 0

Rename Columns

Remove prefix “_” from columns to reduce chance of issues later

# Function to remove leading underscores from column names
remove_leading_underscores <- function(df) {
  colnames(df) <- gsub("^_", "", colnames(df))
  return(df)
}

# Create a copy of mc3_data2 and name it mc3_data3
mc3_data3 <- mc3_data2

# Apply the function to the nodes and links data frames in mc3_data3
mc3_data3$nodes <- remove_leading_underscores(mc3_data3$nodes)
mc3_data3$links <- remove_leading_underscores(mc3_data3$links)

Verify changes

colnames(mc3_data3$nodes)
 [1] "type"             "country"          "ProductServices"  "PointOfContact"  
 [5] "HeadOfOrg"        "founding_date"    "revenue"          "TradeDescription"
 [9] "last_edited_by"   "last_edited_date" "date_added"       "raw_source"      
[13] "algorithm"        "id"               "dob"             
colnames(mc3_data3$links)
 [1] "start_date"       "type"             "last_edited_by"   "last_edited_date"
 [5] "date_added"       "raw_source"       "algorithm"        "source"          
 [9] "target"           "key"              "end_date"        

Standardising Date Time Formats

In preparation for temporal analysis

# Create a copy of mc3_data3 and name it mc3_data4
mc3_data4 <- mc3_data3

# Convert date columns to Date-Time type
mc3_data4$nodes <- mc3_data4$nodes %>%
  mutate(
    founding_date = ymd_hms(founding_date),
    last_edited_date = ymd_hms(last_edited_date),
    date_added = ymd_hms(date_added),
    dob = ymd_hms(dob)
  )
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `dob = ymd_hms(dob)`.
Caused by warning:
!  176 failed to parse.
mc3_data4$links <- mc3_data4$links %>%
  mutate(
    start_date = ymd_hms(start_date),
    last_edited_date = ymd_hms(last_edited_date),
    date_added = ymd_hms(date_added),
    end_date = ymd_hms(end_date)
  )
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `start_date = ymd_hms(start_date)`.
Caused by warning:
!  14630 failed to parse.
Note

The ymd_hms function is designed to work with character vectors and will return NA for any existing NA values. This means that any NA value in the original columns will remain NA after the conversion.

Verify changes

# View the first few rows of the date columns in nodes
head(mc3_data4$nodes %>% select(founding_date, last_edited_date, date_added, dob))
  founding_date last_edited_date date_added  dob
1    1954-04-24       2035-01-01 2035-01-01 <NA>
2    2009-06-12       2035-01-01 2035-01-01 <NA>
3    2029-12-15       2035-01-01 2035-01-01 <NA>
4    1972-02-16       2035-01-01 2035-01-01 <NA>
5    1954-04-06       2035-01-01 2035-01-01 <NA>
6    2031-09-30       2035-01-01 2035-01-01 <NA>
# View the first few rows of the date columns in links
head(mc3_data4$links %>% select(start_date))
  start_date
1 2016-10-29
2 2035-06-03
3 2028-11-20
4 2024-09-04
5 2034-11-12
6 2007-04-06
# Summary of date columns in nodes
summary(mc3_data4$nodes %>% select(founding_date, last_edited_date, date_added, dob))
 founding_date                     last_edited_date               
 Min.   :1945-01-01 00:00:00.000   Min.   :2035-01-01 00:00:00.0  
 1st Qu.:1968-01-11 00:00:00.000   1st Qu.:2035-01-01 00:00:00.0  
 Median :1991-07-03 00:00:00.000   Median :2035-01-01 00:00:00.0  
 Mean   :1991-04-22 15:54:58.072   Mean   :2035-01-02 10:34:13.4  
 3rd Qu.:2014-09-04 12:00:00.000   3rd Qu.:2035-01-01 00:00:00.0  
 Max.   :2035-12-29 00:00:00.000   Max.   :2036-01-15 00:00:00.0  
 NA's   :51649                                                    
   date_added                         dob                         
 Min.   :2035-01-01 00:00:00.0   Min.   :1970-01-02 00:00:00.000  
 1st Qu.:2035-01-01 00:00:00.0   1st Qu.:1978-01-30 00:00:00.000  
 Median :2035-01-01 00:00:00.0   Median :1986-02-06 00:00:00.000  
 Mean   :2035-01-02 10:28:32.2   Mean   :1987-05-23 22:21:33.182  
 3rd Qu.:2035-01-01 00:00:00.0   3rd Qu.:1995-05-13 00:00:00.000  
 Max.   :2036-01-15 00:00:00.0   Max.   :2017-03-20 00:00:00.000  
                                 NA's   :9047                     
# Summary of date columns in links
summary(mc3_data4$links %>% select(start_date))
   start_date                    
 Min.   :1952-05-31 00:00:00.00  
 1st Qu.:2015-08-18 00:00:00.00  
 Median :2024-03-22 00:00:00.00  
 Mean   :2022-11-23 10:50:43.11  
 3rd Qu.:2030-12-13 00:00:00.00  
 Max.   :2035-12-29 00:00:00.00  
 NA's   :14720                   
# Check the types of the date columns in nodes
str(mc3_data4$nodes %>% select(founding_date, last_edited_date, date_added, dob))
'data.frame':   60520 obs. of  4 variables:
 $ founding_date   : POSIXct, format: "1954-04-24" "2009-06-12" ...
 $ last_edited_date: POSIXct, format: "2035-01-01" "2035-01-01" ...
 $ date_added      : POSIXct, format: "2035-01-01" "2035-01-01" ...
 $ dob             : POSIXct, format: NA NA ...
# Check the types of the date columns in links
str(mc3_data4$links %>% select(start_date))
'data.frame':   75817 obs. of  1 variable:
 $ start_date: POSIXct, format: "2016-10-29" "2035-06-03" ...
view(mc3_data4[["nodes"]])
view(mc3_data4[["links"]])

Split Words

The steps below will be used to split text in type column of nodes into two columns: namely type1 and type2.

# Make a copy of mc3_data4
mc3_data5 <- mc3_data4

# Split the type column into two columns
mc3_data5$nodes <- mc3_data5$nodes %>%
  mutate(
    type1 = sub("^(\\S+).*", "\\1", type),
    type2 = sub("^\\S+\\.(.*)", "\\1", type)
  )

# If there's only one word in type, set type2 to NA
mc3_data5$nodes$type2 <- ifelse(grepl("\\.", mc3_data5$nodes$type), mc3_data5$nodes$type2, NA)

# Remove the original 'type' column
mc3_data5$nodes <- mc3_data5$nodes %>%
  select(-type)

The steps below will be used to split text in type column of links into two columns: namely type1 and type2.

# Make a copy of mc3_data4
mc3_data6 <- mc3_data5

# Split the type column into two columns
# There are no special cases, exception left blank
mc3_data6$links <- mc3_data6$links %>%
  mutate(
    type1 = sub("(.*?\\..*?)(\\.[^.]+)?$", "\\1", type),
    type2 = ifelse(grepl("\\.", type), sub(".*\\.", "", type), "")
  )

# remove the original 'type' column
mc3_data6$links <- mc3_data6$links %>%
  select(-type)

Verify changes

# View the first few rows of the type columns in nodes
head(mc3_data6$nodes %>% select(type1,type2))
                        type1   type2
1 Entity.Organization.Company Company
2 Entity.Organization.Company Company
3 Entity.Organization.Company Company
4 Entity.Organization.Company Company
5 Entity.Organization.Company Company
6 Entity.Organization.Company Company
# View the first few rows of the type columns in links
head(mc3_data6$links %>% select(type1,type2))
       type1           type2
1 Event.Owns Shareholdership
2 Event.Owns Shareholdership
3 Event.Owns Shareholdership
4 Event.Owns Shareholdership
5 Event.Owns Shareholdership
6 Event.Owns Shareholdership
view(mc3_data6[["nodes"]])
view(mc3_data6[["links"]])

Extract Nodes

For Question 1

#keep only necessary columns
mc3_nodes_1 <- as_tibble(mc3_data6$nodes) %>%
  select (-TradeDescription,
          -last_edited_by,
          -last_edited_date,
          -algorithm,
          -dob,
          -type1)

Save as rds file for future use

write_rds(mc3_nodes_1, "data/rds/mc3_nodes_1.rds")

Load Data

Load rds file

Note: rds files can be loaded independently of the data wrangling steps above to save time

mc3_links_1 <- readRDS("data/rds/mc3_links_1.rds")
mc3_nodes_1 <- readRDS("data/rds/mc3_nodes_1.rds")

Changes in Corporate Structures Over Time

The plot shows how transaction volume changes over time, which helps identify periods of increased or decreased activity

transactions_over_time <- mc3_links_1 %>%
  group_by(start_date) %>%
  summarize(count = n()) %>%
  drop_na()

Number of Transactions over Time

Number of links can be used to determine transactions over time

ggplot(transactions_over_time, aes(x = start_date, y = count)) +
  geom_line() +
  labs(title = "Transactions Over Time", x = "Date", y = "Number of Transactions")
Note

The dataset spans from year 1952 to 2035.

We can see that from the start of the dataset until about year 2000, there were relatively few transactions. There was a small spike after year 2000, proceeded by exponential growth around 2005. However, there was a dip in transactions in 2035.

The dip could be due to effects after SouthSeafood Express Corp was caught for illegal behaviour and eventually closed in 2035.

Analysis should focus on transactions from year 2005 onwards. Data analysed should also be aggregated by year.

Filter data

Filter data to only keep transactions from 2000 (5 years before 2005) to 2035 (end of dataset). We keep some data that occurs before the start of our period of interest to capture any recent changes to entities.

# Filter the data frames to keep only data from the year 2000 and onwards
mc3_links_1_filtered <- mc3_links_1 %>%
  filter(start_date >= as.Date("2000-01-01"))

Aggregate Data by Year

# Extract year for aggregation
mc3_links_1_filtered2 <- mc3_links_1_filtered %>%
  mutate(transaction_year = year(start_date))

# Calculate the number of transactions per year
yearly_txns <- mc3_links_1_filtered2 %>%
  group_by(transaction_year) %>%
  summarise(num_transactions = n())

# Plot the number of transactions per year
ggplot(yearly_txns, aes(x = transaction_year, y = num_transactions)) +
  geom_line(color = "blue") +
  labs(title = "Number of Transactions Per Year",
       x = "Year",
       y = "Number of Transactions") +
  theme_minimal()
Note

It is now clearer that the rapid growth in transactions started around 2005, before reaching its peak at 2034 and sharply dropping in 2035, likely due to after effects of the SouthSeafood Express Corp incident.

Number of Active Companies Per Year

Drop na values

mc3_nodes2_1 <- mc3_nodes_1 %>%
  drop_na(founding_date) # removes Persons and Persons CEO

Number of nodes can be used to determine the number of active companies per year.

# Extract year for aggregation
mc3_nodes3_1 <- mc3_nodes2_1 %>%
  mutate(active_year = floor_date(founding_date, "year"))

# Calculate the number of active companies per year
active_companies <- mc3_nodes3_1 %>%
  group_by(active_year) %>%
  summarise(num_active_companies = n())

Summary

# Calculate the summary statistics
summary_stats <- summary(active_companies$num_active_companies)
summary_stats
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  79.00   88.50   96.00   97.48  104.00  136.00 
# Extract and save the mean
# Round to 2 decimal places
mean_active_companies <- round(summary_stats["Mean"], 2)

Plot graph

# Plot the number of active companies over time
ggplot(active_companies, aes(x = active_year, y = num_active_companies)) +
  # line plot
  geom_line(color = "darkgreen") +
  labs(title = "Number of Active Companies Over Time",
       x = "Date",
       y = "Number of Active Companies") +
  # mean line
  geom_hline(aes(yintercept = mean_active_companies), 
             linetype = "dotted", color = "blue") +
  annotate("text", x = min(active_companies$active_year), 
           y = mean_active_companies, 
           label = paste("Mean:", mean_active_companies), 
           hjust = 0, vjust = -1, color = "blue") +
  theme_minimal()
Note

While there are fluctuations in the number of active companies over time, there is generally an increasing trend of the number of active companies over time, especially around 2010 onwards. This period shows a rising trend with the number of active companies reaching the highest values in the dataset. This is similar to that observed in the number of transactions over time, seen above.

We also see a dip around 2035, before the numbers increase again. Also likely due to the after effects of the SouthSeafood Express Corp incident.

Centrality Measures

Modifying network nodes and edges

Prepare the edges dataframe for network analysis by:

  • Ensuring all edges are unique.

  • Converting columns to a uniform type.

  • Calculating the weight of each edge (how many times each connection occurs).

  • Removing any self-loops.

mc3_edges <-
  as_tibble(mc3_links_1_filtered2) %>%
  distinct() %>%
  mutate(source = as.character(source),
         target = as.character(target),
         type = as.character(type2),
         tyear = as.integer(transaction_year)) %>%
  group_by(source, target, type,tyear) %>%
  summarise(weights = n()) %>%
  filter(source != target) %>%
  ungroup()
`summarise()` has grouped output by 'source', 'target', 'type'. You can
override using the `.groups` argument.
Note

The resulting mc3_edges tibble contains the columns source, target, type, year, and weights, where each row represents a unique edge between two nodes with a specific type, and the weights column represents the number of times that edge occurs.

Clean and preprocess the nodes data by:

  • Ensuring that each column has the correct data type for analysis.

  • Selecting only the necessary columns for further analysis or visualization.

mc3_nodes <- as_tibble(mc3_nodes_1) %>%
  mutate(country = as.character(country), 
         id = as.character(id), 
         ProductServices = as.character(ProductServices), 
         revenue = as.numeric(as.character(revenue)), 
         type = as.character(type2)) %>%
  select(id, country, type, revenue, ProductServices)
Note

The resulting mc3_nodes tibble contains the cleaned and correctly typed columns id, country, type, revenue, and ProductServices.

Keeping unique values

Edges

unique_transaction_types_edges <- mc3_edges %>%
  select(type) %>%
  distinct()

# Display the unique transaction types
print(unique_transaction_types_edges)
# A tibble: 4 × 1
  type               
  <chr>              
1 Shareholdership    
2 BeneficialOwnership
3 WorksFor           
4 FamilyRelationship 
Note

There are 4 types of edges, namely:

  • Shareholdership

  • WorksFor

  • BeneficialOwnership

  • FamilyRelationship

Nodes

unique_transaction_types_nodes <- mc3_nodes %>%
  select(type) %>%
  distinct()

# Display the unique transaction types
print(unique_transaction_types_nodes)
# A tibble: 8 × 1
  type            
  <chr>           
1 Company         
2 LogisticsCompany
3 FishingCompany  
4 FinancialCompany
5 NewsCompany     
6 NGO             
7 Person          
8 CEO             
Note

There are 8 types of nodes, namely:

  • Company

  • LogisticsCompany

  • FishingCompany

  • FinancialCompany

  • NewsCompany

  • NGO

  • Person

  • CEO

Extract all the source and target nodes

Extract all the source and target nodes, then, drop any unmatched nodes

id1 <- mc3_edges %>%
  select(source) %>%
  rename(id = source)

id2 <- mc3_edges %>%
  select(target) %>%
  rename(id = target)

mc3_nodes1 <- rbind(id1, id2) %>%
  distinct() %>%
  left_join(mc3_nodes, by = c("id" = "id")) %>%
  mutate(unmatched = "drop")

Verify results

print(mc3_nodes1)
# A tibble: 60,489 × 6
   id                     country     type     revenue ProductServices unmatched
   <chr>                  <chr>       <chr>      <dbl> <chr>           <chr>    
 1 4. SeaCargo Ges.m.b.H. Oceanus     Logisti…  23304. Tuna, sword fi… drop     
 2 9. RiverLine CJSC      Oceanus     Company   50134. Unknown         drop     
 3 Aaron Acosta           Mawalara    Person        0  <NA>            drop     
 4 Aaron Allen            Galduzim    Person        0  <NA>            drop     
 5 Aaron Austin           Kethilim    Person        0  <NA>            drop     
 6 Aaron Baker            Azurionix   Person        0  <NA>            drop     
 7 Aaron Barry            Kondanovia  Person        0  <NA>            drop     
 8 Aaron Bauer            Rio Solovia Person        0  <NA>            drop     
 9 Aaron Bishop           Osterivaro  Person        0  <NA>            drop     
10 Aaron Bolton           n.a.        Person        0  <NA>            drop     
# ℹ 60,479 more rows

Create Graph Object

Create graph object and calculate centrality measures

mc3_graph <- tbl_graph(nodes = mc3_nodes1, edges = mc3_edges, directed = TRUE) %>%
  mutate(betweenness_centrality = centrality_betweenness(),
         closeness_centrality = centrality_closeness())
  theme_graph()
## List of 136
##  $ line                            :List of 6
##   ..$ colour       : chr "black"
##   ..$ linewidth    : num 0.5
##   ..$ linetype     : num 1
##   ..$ lineend      : chr "butt"
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ rect                            :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : chr "black"
##   ..$ linewidth    : num 0.5
##   ..$ linetype     : num 1
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ text                            :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : chr "plain"
##   ..$ colour       : chr "black"
##   ..$ size         : num 11
##   ..$ hjust        : num 0.5
##   ..$ vjust        : num 0.5
##   ..$ angle        : num 0
##   ..$ lineheight   : num 0.9
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : logi FALSE
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ title                           : NULL
##  $ aspect.ratio                    : NULL
##  $ axis.title                      : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.title.x                    :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x.top                :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x.bottom             : NULL
##  $ axis.title.y                    :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : num 90
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.y.left               : NULL
##  $ axis.title.y.right              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : num -90
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text                       : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.text.x                     :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x.top                 :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x.bottom              : NULL
##  $ axis.text.y                     :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 1
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.y.left                : NULL
##  $ axis.text.y.right               :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.theta                 : NULL
##  $ axis.text.r                     :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0.5
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 2.2points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.ticks                      : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.ticks.x                    : NULL
##  $ axis.ticks.x.top                : NULL
##  $ axis.ticks.x.bottom             : NULL
##  $ axis.ticks.y                    : NULL
##  $ axis.ticks.y.left               : NULL
##  $ axis.ticks.y.right              : NULL
##  $ axis.ticks.theta                : NULL
##  $ axis.ticks.r                    : NULL
##  $ axis.minor.ticks.x.top          : NULL
##  $ axis.minor.ticks.x.bottom       : NULL
##  $ axis.minor.ticks.y.left         : NULL
##  $ axis.minor.ticks.y.right        : NULL
##  $ axis.minor.ticks.theta          : NULL
##  $ axis.minor.ticks.r              : NULL
##  $ axis.ticks.length               : 'simpleUnit' num 2.75points
##   ..- attr(*, "unit")= int 8
##  $ axis.ticks.length.x             : NULL
##  $ axis.ticks.length.x.top         : NULL
##  $ axis.ticks.length.x.bottom      : NULL
##  $ axis.ticks.length.y             : NULL
##  $ axis.ticks.length.y.left        : NULL
##  $ axis.ticks.length.y.right       : NULL
##  $ axis.ticks.length.theta         : NULL
##  $ axis.ticks.length.r             : NULL
##  $ axis.minor.ticks.length         : 'rel' num 0.75
##  $ axis.minor.ticks.length.x       : NULL
##  $ axis.minor.ticks.length.x.top   : NULL
##  $ axis.minor.ticks.length.x.bottom: NULL
##  $ axis.minor.ticks.length.y       : NULL
##  $ axis.minor.ticks.length.y.left  : NULL
##  $ axis.minor.ticks.length.y.right : NULL
##  $ axis.minor.ticks.length.theta   : NULL
##  $ axis.minor.ticks.length.r       : NULL
##  $ axis.line                       : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.line.x                     : NULL
##  $ axis.line.x.top                 : NULL
##  $ axis.line.x.bottom              : NULL
##  $ axis.line.y                     : NULL
##  $ axis.line.y.left                : NULL
##  $ axis.line.y.right               : NULL
##  $ axis.line.theta                 : NULL
##  $ axis.line.r                     : NULL
##  $ legend.background               : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ legend.margin                   : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
##   ..- attr(*, "unit")= int 8
##  $ legend.spacing                  : 'simpleUnit' num 11points
##   ..- attr(*, "unit")= int 8
##  $ legend.spacing.x                : NULL
##  $ legend.spacing.y                : NULL
##  $ legend.key                      : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ legend.key.size                 : 'simpleUnit' num 1.2lines
##   ..- attr(*, "unit")= int 3
##  $ legend.key.height               : NULL
##  $ legend.key.width                : NULL
##  $ legend.key.spacing              : 'simpleUnit' num 5.5points
##   ..- attr(*, "unit")= int 8
##  $ legend.key.spacing.x            : NULL
##  $ legend.key.spacing.y            : NULL
##  $ legend.frame                    : NULL
##  $ legend.ticks                    : NULL
##  $ legend.ticks.length             : 'rel' num 0.2
##  $ legend.axis.line                : NULL
##  $ legend.text                     :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.text.position            : NULL
##  $ legend.title                    :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.title.position           : NULL
##  $ legend.position                 : chr "right"
##  $ legend.position.inside          : NULL
##  $ legend.direction                : NULL
##  $ legend.byrow                    : NULL
##  $ legend.justification            : chr "center"
##  $ legend.justification.top        : NULL
##  $ legend.justification.bottom     : NULL
##  $ legend.justification.left       : NULL
##  $ legend.justification.right      : NULL
##  $ legend.justification.inside     : NULL
##  $ legend.location                 : NULL
##  $ legend.box                      : NULL
##  $ legend.box.just                 : NULL
##  $ legend.box.margin               : 'margin' num [1:4] 0cm 0cm 0cm 0cm
##   ..- attr(*, "unit")= int 1
##  $ legend.box.background           : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ legend.box.spacing              : 'simpleUnit' num 11points
##   ..- attr(*, "unit")= int 8
##   [list output truncated]
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi TRUE
##  - attr(*, "validate")= logi TRUE

Network Graph

Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
ℹ Please use the `transform` argument instead.

# Display the network graph
ggraph(mc3_graph, layout = "fr") + # Using Fruchterman-Reingold layout
  geom_edge_link(aes(edge_alpha = 0.8, edge_width = 0.8)) + # Customize edge appearance
  geom_node_point(aes(size = betweenness_centrality, color = closeness_centrality)) + # Customize node appearance
  scale_color_viridis_c() + # Use viridis color scale
  theme_void() + # Use a void theme
  labs(title = "Refined Network Graph of Atypical Business Transactions",
       subtitle = "Nodes colored by closeness centrality and sized by betweenness centrality",
       caption = "Data Source: mc3.json") # Add titles and captions
Note

The grey circular portion in the center of the network graph does not represent any specific data or entities. It is a visual byproduct resulting from the dense clustering of nodes and edges in that central region. This effect is particularly noticeable in dense, highly interconnected network visualizations where nodes and edges are concentrated in a small space.

Thus, we filter the nodes to refine the graph.

Refined Network Graph

Top Nodes

Identify top 20 nodes by betweenness centrality

# Identify top nodes by betweenness centrality
top_nodes <- mc3_graph %>% 
  as_tibble()

# Identify top 20 nodes by betweenness centrality
top_nodes2 <- top_nodes %>% 
  top_n(20, wt = betweenness_centrality)

List of most active people and businesses

top_nodes2
# A tibble: 21 × 8
   id     country type  revenue ProductServices unmatched betweenness_centrality
   <chr>  <chr>   <chr>   <dbl> <chr>           <chr>                      <dbl>
 1 Corte… Mawala… Comp…  6.99e3 Finish carpent… drop                          22
 2 Evans… Oceanus Fish…  5.50e4 Processing and… drop                          29
 3 Fried… Mawand… Comp…  1.64e4 Grocery produc… drop                          38
 4 Gvard… Nalaki… Comp…  6.85e4 Shipping servi… drop                          33
 5 Hill … Oceanus Comp…  4.75e3 Unknown         drop                          31
 6 Howel… Mawand… Comp…  7.74e6 High-grade met… drop                          54
 7 Johns… Valtal… Comp…  3.35e4 Machinery and … drop                          33
 8 Kaise… Isla S… Comp…  2.32e4 Canned and cur… drop                          26
 9 King … Oceanus Comp…  0      Operation of i… drop                          29
10 Lane … Imazam  Fish…  4.80e3 Fish and seafo… drop                          33
# ℹ 11 more rows
# ℹ 1 more variable: closeness_centrality <dbl>
Note

It is likely that these entities on the top 10 list are big players in the industry and control information and resources.

High betweenness centrality means that a node plays a more crucial role in connecting other nodes. It can be an indicator of:

  • Brokerage Role: Nodes with high betweenness centrality often act as bridges or intermediaries between different parts of the network. They control the flow of information, resources, or interactions between other nodes.

  • Control and Influence: Nodes with high betweenness centrality have the potential to control the flow of information or resources in the network. They may have more influence or power over the network dynamics compared to other nodes.

Plot refined graph

# Extract IDs of top nodes
# Extract IDs of top nodes
top_node_ids <- top_nodes$id

# Filter the graph to include only top nodes and their incident edges
top_graph <- mc3_graph %>%
  activate(nodes) %>%
  filter(id %in% top_node_ids) %>%
  activate(edges) %>%
  filter(edge_is_incident(top_node_ids))

# Plot the network graph with top nodes
ggraph(top_graph, layout = "fr") + 
  geom_edge_link(aes(edge_alpha = 0.1, edge_width = 0.1)) +
  geom_node_point(aes(size = betweenness_centrality, color = closeness_centrality)) +
  scale_color_viridis_c() +
  theme_void() +
  labs(title = "Top 20 Nodes Network Graph",
       subtitle = "Nodes colored by closeness centrality and sized by betweenness centrality",
       caption = "Data Source: mc3.json")
Note

We can see that for the top nodes, they are highly interconnected. To the extent that the graph becomes less interpretable.

Centrality Over Time

Plot Graph

# Assuming you have a function to calculate centrality for each year
calculate_centrality_over_time <- function(nodes, edges, time_unit = "year")
  {
  edges <- edges %>%
    mutate(period = as.Date(paste0(tyear, "-01-01")))
  
  centrality_results <- edges %>%
    group_by(period) %>%
    do({
      current_edges <- .
      current_nodes <- nodes %>% filter(id %in% unique(c(current_edges$source, current_edges$target)))
      graph <- tbl_graph(nodes = current_nodes, edges = current_edges, directed = TRUE)
      graph %>%
        mutate(betweenness = centrality_betweenness()) %>%
        as_tibble() %>%
        summarise(mean_betweenness = mean(betweenness, na.rm = TRUE))
    }) %>%
    ungroup()
  
  return(centrality_results)
}

# Calculate centrality measures over time
centrality_over_time <- calculate_centrality_over_time(mc3_nodes1, mc3_edges)

# Plot centrality measures over time
ggplot(centrality_over_time, aes(x = period, y = mean_betweenness)) +
  geom_line(color = "red") +
  labs(title = "Average Betweenness Centrality Per Year",
       x = "Year",
       y = "Mean Betweenness Centrality") +
  theme_minimal()

Note

This graph shows that there is a increasing trend of the average betweenness centrality per year over time.

From around 2000 to 2020, the average betweenness centrality remains relatively low and fluctuates within a small range, indicating a stable network structure during this period. However, starting around 2025, there is a sharp and dramatic increase in the average betweenness centrality.

This sudden rise suggests a significant change in the network dynamics, where certain nodes or entities are becoming increasingly important as bridges or intermediaries connecting different parts of the network. Such a drastic increase could potentially indicate the emergence of new influential players, changes in transaction patterns, or the formation of new connections and pathways within the network.

The rapid growth in average betweenness centrality implies that the network structure is becoming more centralized, with a smaller number of nodes acting as critical hubs or gatekeepers, controlling the flow of information or transactions within the network.

Key Influencers

Extract key influencers and their edges

# Filter mc3_edges to keep rows where source ID is in top_nodes2
keypersonnel <- mc3_edges %>%
  filter(source %in% top_nodes2$id)

Key influencers of the industry

unique(keypersonnel$source)
 [1] "Cortez LLC"                       "Evans-Pearson"                   
 [3] "Friedman, Gibson and Garcia"      "GvardeyskAmerica Shipping Plc"   
 [5] "Hill PLC"                         "Howell LLC"                      
 [7] "Johnson, Perez and Salinas"       "Kaiser, Warren and Shepard"      
 [9] "King and Sons"                    "Lane Group"                      
[11] "Lee-Ramirez"                      "Mcpherson-Wright"                
[13] "NamRiver Transit A/S"             "Osborne, Saunders and Brown"     
[15] "Patel-Miller"                     "Ramos, Jordan and Stewart"       
[17] "Rivera, Lee and Carroll"          "Russell and Sons"                
[19] "Stein, Taylor and Williams"       "StichtingMarine Shipping Company"
[21] "Vasquez-Gonzalez"                
Note

The key influencers are:

  • Cortez LLC

  • Evans-Pearson

  • Friedman, Gibson and Garcia

  • GvardeyskAmerica Shipping Plc

  • Hill PLC

  • Howell LLC

  • Johnson, Perez and Salinas

  • Kaiser, Warren and Shepard

  • King and Sons

  • Lane Group

  • Lee-Ramirez

  • Mcpherson-Wright

  • NamRiver Transit A/S

  • Osborne, Saunders and Brown

  • Patel-Miller

  • Ramos, Jordan and Stewart

  • Rivera, Lee and Carroll Russell and Sons

  • Stein, Taylor and Williams

  • StichtingMarine Shipping Company

  • Vasquez-Gonzalez

Relationship between influencers and their links

unique(keypersonnel$type)
[1] "Shareholdership"
Note

There is only 1 type of relationship between the influencers and their links. The influencers are shareholders of those they are linked to.

Network Graph

Create Graph Object

# Create a nodes dataframe from the unique source and target values
nodes <- unique(c(keypersonnel$source, keypersonnel$target)) %>%
  data.frame(name = .)

# Create the graph object using tbl_graph
graph_data <- tbl_graph(nodes = nodes,
                        edges = keypersonnel %>%
                          rename(from = source, to = target),
                        directed = TRUE)

Plot Graph

# Plot the directed graph
ggraph(graph_data, layout = "fr") +  # Using Fruchterman-Reingold layout
  geom_edge_link(aes(label = as.character(tyear)),  # Only label with tyear
                 arrow = arrow(length = unit(4, 'mm')),  # Add arrows to indicate direction
                 end_cap = circle(3, 'mm'),  # Cap the end of the edges with a circle
                 label_dodge = unit(2, "mm"),  # Adjust label position to avoid overlap
                 label_size = 3,  # Set label size
                 edge_width = 0.8,  # Set edge width
                 edge_alpha = 0.8) +  # Set edge transparency
  geom_node_point(size = 5, color = "blue") +  # Customize node appearance
  geom_node_text(aes(label = name), vjust = 1.5, size = 4) +  # Add node labels
  theme_void() +  # Use a void theme
  labs(title = "Directed Network Graph of Key Personnel Transactions",
       subtitle = "Nodes represent unique sources and targets, edges labeled with year",
       caption = "Data Source: keypersonnel")  # Add titles and captions
Note

Most number of links:

Entity Name Number of Links
GvardeyskAmerica Shipping Plc 4
Rivera, Lee and Carroll 3
Cortez LLC 2
Kaiser, Warren and Shepard 2
Mcpherson-Wright 2
Patel-Miller 2
StichtingMarine Shipping Company 2
Vasquez-Gonzalez 2

The earliest link: Lane Group has been the shareholder of Howell LLC since 2020.

The most recent link: GvardeyskAmerica Shipping Plc is the shareholder of ArawakFish Cargo Ges.m.b.H.. since 2034.

Finding atypical Business Relationships

# Creating the graph object
mc3_graph1 <- tbl_graph(nodes = mc3_nodes1, edges = mc3_edges, directed = FALSE) %>%
  mutate(betweenness_centrality = centrality_betweenness(),
         closeness_centrality = centrality_closeness())
  theme_graph()
List of 136
 $ line                            :List of 6
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ lineend      : chr "butt"
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ rect                            :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ text                            :List of 11
  ..$ family       : chr "Arial Narrow"
  ..$ face         : chr "plain"
  ..$ colour       : chr "black"
  ..$ size         : num 11
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : num 0
  ..$ lineheight   : num 0.9
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : logi FALSE
  ..$ inherit.blank: logi FALSE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ title                           : NULL
 $ aspect.ratio                    : NULL
 $ axis.title                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.title.x                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.top                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.bottom             : NULL
 $ axis.title.y                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.y.left               : NULL
 $ axis.title.y.right              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text                       : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.text.x                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.top                 :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.bottom              : NULL
 $ axis.text.y                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.y.left                : NULL
 $ axis.text.y.right               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.theta                 : NULL
 $ axis.text.r                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0.5
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.ticks                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.ticks.x                    : NULL
 $ axis.ticks.x.top                : NULL
 $ axis.ticks.x.bottom             : NULL
 $ axis.ticks.y                    : NULL
 $ axis.ticks.y.left               : NULL
 $ axis.ticks.y.right              : NULL
 $ axis.ticks.theta                : NULL
 $ axis.ticks.r                    : NULL
 $ axis.minor.ticks.x.top          : NULL
 $ axis.minor.ticks.x.bottom       : NULL
 $ axis.minor.ticks.y.left         : NULL
 $ axis.minor.ticks.y.right        : NULL
 $ axis.minor.ticks.theta          : NULL
 $ axis.minor.ticks.r              : NULL
 $ axis.ticks.length               : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ axis.ticks.length.x             : NULL
 $ axis.ticks.length.x.top         : NULL
 $ axis.ticks.length.x.bottom      : NULL
 $ axis.ticks.length.y             : NULL
 $ axis.ticks.length.y.left        : NULL
 $ axis.ticks.length.y.right       : NULL
 $ axis.ticks.length.theta         : NULL
 $ axis.ticks.length.r             : NULL
 $ axis.minor.ticks.length         : 'rel' num 0.75
 $ axis.minor.ticks.length.x       : NULL
 $ axis.minor.ticks.length.x.top   : NULL
 $ axis.minor.ticks.length.x.bottom: NULL
 $ axis.minor.ticks.length.y       : NULL
 $ axis.minor.ticks.length.y.left  : NULL
 $ axis.minor.ticks.length.y.right : NULL
 $ axis.minor.ticks.length.theta   : NULL
 $ axis.minor.ticks.length.r       : NULL
 $ axis.line                       : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.line.x                     : NULL
 $ axis.line.x.top                 : NULL
 $ axis.line.x.bottom              : NULL
 $ axis.line.y                     : NULL
 $ axis.line.y.left                : NULL
 $ axis.line.y.right               : NULL
 $ axis.line.theta                 : NULL
 $ axis.line.r                     : NULL
 $ legend.background               : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.margin                   : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.spacing                  : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ legend.spacing.x                : NULL
 $ legend.spacing.y                : NULL
 $ legend.key                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.key.size                 : 'simpleUnit' num 1.2lines
  ..- attr(*, "unit")= int 3
 $ legend.key.height               : NULL
 $ legend.key.width                : NULL
 $ legend.key.spacing              : 'simpleUnit' num 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.key.spacing.x            : NULL
 $ legend.key.spacing.y            : NULL
 $ legend.frame                    : NULL
 $ legend.ticks                    : NULL
 $ legend.ticks.length             : 'rel' num 0.2
 $ legend.axis.line                : NULL
 $ legend.text                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.text.position            : NULL
 $ legend.title                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.title.position           : NULL
 $ legend.position                 : chr "right"
 $ legend.position.inside          : NULL
 $ legend.direction                : NULL
 $ legend.byrow                    : NULL
 $ legend.justification            : chr "center"
 $ legend.justification.top        : NULL
 $ legend.justification.bottom     : NULL
 $ legend.justification.left       : NULL
 $ legend.justification.right      : NULL
 $ legend.justification.inside     : NULL
 $ legend.location                 : NULL
 $ legend.box                      : NULL
 $ legend.box.just                 : NULL
 $ legend.box.margin               : 'margin' num [1:4] 0cm 0cm 0cm 0cm
  ..- attr(*, "unit")= int 1
 $ legend.box.background           : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.box.spacing              : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
  [list output truncated]
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi TRUE
 - attr(*, "validate")= logi TRUE
# Identify top nodes by betweenness centrality
top_nodes <- mc3_graph1 %>% 
  as_tibble() %>% 
  filter(betweenness_centrality >= 3000000)
head(top_nodes, n = 50)
# A tibble: 50 × 8
   id     country type  revenue ProductServices unmatched betweenness_centrality
   <chr>  <chr>   <chr>   <dbl> <chr>           <chr>                      <dbl>
 1 Aaron… n.a.    Pers…       0 <NA>            drop                    7756175.
 2 Alan … n.a.    Pers…       0 <NA>            drop                    7756175.
 3 Aleja… Valtal… Pers…       0 <NA>            drop                    7231016.
 4 Alex … Arvaros Pers…       0 <NA>            drop                    3442208 
 5 Alexa… n.a.    Pers…       0 <NA>            drop                    7756175.
 6 Amand… Mawala… Pers…       0 <NA>            drop                    7944565.
 7 Amede… Novarc… Pers…       0 <NA>            drop                    4211301.
 8 Amy A… Ariuzi… CEO         0 <NA>            drop                    3272037.
 9 Amy B… Kondan… Pers…       0 <NA>            drop                    4044917.
10 Andre… Kondan… Pers…       0 <NA>            drop                    3914655.
# ℹ 40 more rows
# ℹ 1 more variable: closeness_centrality <dbl>
# Filter edges for atypical business transactions
atypical_edges <- mc3_edges %>%
  filter(type %in% c("Shareholdership", "WorksFor", "BeneficialOwnership", "FamilyRelationship"))

# Extract nodes that are part of these transactions
atypical_nodes <- mc3_nodes %>%
  filter(id %in% unique(c(atypical_edges$source, atypical_edges$target)))

# Create the graph object with filtered data
atypical_graph <- tbl_graph(nodes = atypical_nodes, edges = atypical_edges, directed = FALSE) %>%
  mutate(betweenness_centrality = centrality_betweenness(),
         closeness_centrality = centrality_closeness())

# Define higher centrality thresholds for more filtering
betweenness_threshold <- quantile(atypical_graph %>% activate(nodes) %>% pull(betweenness_centrality), 0.995)
closeness_threshold <- quantile(atypical_graph %>% activate(nodes) %>% pull(closeness_centrality), 0.995)

# Filter nodes based on higher centrality thresholds
filtered_graph <- atypical_graph %>%
  activate(nodes) %>%
  filter(betweenness_centrality >= betweenness_threshold | closeness_centrality >= closeness_threshold) %>%
  activate(edges) %>%
  filter(edge_is_between())

# Display the refined network graph
ggraph(filtered_graph, layout = "fr") + # Using Fruchterman-Reingold layout
  geom_edge_link(aes(edge_alpha = 0.8, edge_width = 0.8)) + # Customize edge appearance
  geom_node_point(aes(size = betweenness_centrality, color = closeness_centrality)) + # Customize node appearance
  scale_color_viridis_c() + # Use viridis color scale
  theme_void() + # Use a void theme
  labs(title = "Refined Network Graph of Atypical Business Transactions",
       subtitle = "Nodes colored by closeness centrality and sized by betweenness centrality",
       caption = "Data Source: mc3.json") # Add titles and captions

Step 2 filtering out false positives with a centrality threshold

# Define higher centrality thresholds for more filtering
betweenness_threshold <- quantile(atypical_graph %>% activate(nodes) %>% pull(betweenness_centrality), 0.995)
closeness_threshold <- quantile(atypical_graph %>% activate(nodes) %>% pull(closeness_centrality), 0.995)

# Filter nodes based on higher centrality thresholds
filtered_graph <- atypical_graph %>%
  activate(nodes) %>%
  filter(betweenness_centrality >= betweenness_threshold | closeness_centrality >= closeness_threshold) %>%
  activate(edges) %>%
  filter(edge_is_between())

# Verify the filtered graph object
print(filtered_graph)
# A tbl_graph: 2557 nodes and 2117 edges
#
# A bipartite multigraph with 1146 components
#
# Edge Data: 2,117 × 5 (active)
    from    to type                tyear weights
   <int> <int> <chr>               <int>   <int>
 1     3  1199 BeneficialOwnership  2019       1
 2    25  1199 BeneficialOwnership  2021       1
 3    29  1199 BeneficialOwnership  2019       1
 4   148  1199 BeneficialOwnership  2008       1
 5   150  1199 BeneficialOwnership  2029       1
 6   175  1199 BeneficialOwnership  2012       1
 7   229  1199 BeneficialOwnership  2031       1
 8   316  1199 BeneficialOwnership  2034       1
 9   448  1199 BeneficialOwnership  2021       1
10   479  1199 BeneficialOwnership  2019       1
# ℹ 2,107 more rows
#
# Node Data: 2,557 × 7
  id                country type  revenue ProductServices betweenness_centrality
  <chr>             <chr>   <chr>   <dbl> <chr>                            <dbl>
1 Abbott-Harrison   Uzifri… Comp…      0  Food products                24317375.
2 Adams, Hernandez… Rio Is… Comp…      0  Unknown                             0 
3 Adams-Byrd        Nalako… Comp… 147540. Offers a wide …               4730169.
# ℹ 2,554 more rows
# ℹ 1 more variable: closeness_centrality <dbl>
# Display the refined network graph
ggraph(filtered_graph, layout = "fr") + # Using Fruchterman-Reingold layout
  geom_edge_link(aes(edge_alpha = 0.8, edge_width = 0.8)) + # Customize edge appearance
  geom_node_point(aes(size = betweenness_centrality, color = closeness_centrality)) + # Customize node appearance
  scale_color_viridis_c() + # Use viridis color scale
  theme_void() + # Use a void theme
  labs(title = "Refined Network Graph of Atypical Business Transactions",
       subtitle = "Nodes colored by closeness centrality and sized by betweenness centrality",
       caption = "Data Source: mc3.json") # Add titles and captions

Refining visualisation

We first will need to sift out edges with some form of ownership or working relationship. Following that, the centrality threshold will need to be defined so that most false positives are being filtered out through it and then finally using slice to only take the top 50 edges.

Interpretation

  1. Node Size (Betweenness Centrality):

    • Larger nodes represent entities with higher betweenness centrality. These are nodes that frequently act as intermediaries in the shortest paths between other nodes. They are crucial for the flow of information or transactions in the network.
  2. Node Color (Closeness Centrality):

    • The color of the nodes indicates their closeness centrality. Closeness centrality measures how close a node is to all other nodes in the network. Nodes with higher closeness centrality (often lighter colors) can quickly interact with other nodes.
  3. Labeled Nodes:

    • The labels represent the most central nodes based on betweenness centrality. These are key entities in the network, acting as significant intermediaries in business transactions.
  4. Node Distribution:

    • The dense cluster of nodes in the center indicates a high level of interaction among these entities. Peripheral nodes might indicate entities that are less central but still part of significant transactions.

Observations

  • Highly Central Entities:

    • Entities such as “Augustin Le Texier,” “Sullivan and Sons,” “Tullio Jacuzzi,” and others labeled on the graph are highly central in terms of their ability to broker connections between other nodes.
  • Dense Core:

    • The central area of the graph is densely packed with nodes, indicating a high degree of interconnection among many entities. This suggests a tightly-knit network where many transactions or interactions occur.
  • Peripheral Nodes:

    • Nodes on the periphery, though smaller and less central, still play a role in the network. Their interactions may be with the core or other peripheral nodes.

Potential Actions

  1. Focus on Key Players:

    • Entities with high betweenness and closeness centrality (large, brightly colored nodes) are critical for network connectivity. These entities might be key influencers or major players in business transactions.
  2. Investigate Clusters:

    • The dense central cluster indicates a closely connected group of entities. Investigating these clusters can reveal sub-networks or communities within the larger network.

Cluster Investigation

# Perform clustering on the graph
clustered_graph <- mc3_graph1 %>%
  mutate(cluster = as.factor(group_louvain()))

# Calculate edge betweenness centrality for the entire graph
edge_betweenness_vals <- edge_betweenness(clustered_graph)

# Add edge betweenness centrality to the graph
clustered_graph <- clustered_graph %>%
  activate(edges) %>%
  mutate(edge_betweenness = edge_betweenness_vals)

# Highlight key nodes with betweenness centrality >= 3,000,000 and their clusters
key_nodes_and_clusters <- clustered_graph %>%
  activate(nodes) %>%
  filter(betweenness_centrality >= 3000000) %>%
  pull(cluster) %>%
  unique()

# Filter the graph to include only the key nodes and their clusters
filtered_graph <- clustered_graph %>%
  activate(nodes) %>%
  filter(cluster %in% key_nodes_and_clusters)

# Calculate edge betweenness sum for the filtered graph
filtered_graph <- filtered_graph %>%
  activate(edges) %>%
  mutate(edge_betweenness_sum = .N()$betweenness_centrality[from] + .N()$betweenness_centrality[to])

# Extract layout data for node positions
graph_layout <- create_layout(filtered_graph, layout = "fr")

# Enhanced plot with labels, colors, and improved legend
p <- ggraph(graph_layout) +  # Using precomputed layout
  geom_edge_link(aes(width = edge_betweenness_sum / max(edge_betweenness_sum), 
                     alpha = edge_betweenness_sum / max(edge_betweenness_sum), 
                     tooltip = edge_betweenness_sum), 
                 color = "gray") + # Customize edge appearance
  geom_node_point(aes(size = betweenness_centrality, color = cluster, 
                      alpha = ifelse(betweenness_centrality >= 3000000, 1, 0.4),
                      tooltip = paste("ID:", id, "<br>Cluster:", cluster, "<br>Betweenness:", betweenness_centrality))) +  # Add tooltip information
  geom_node_text(aes(label = ifelse(betweenness_centrality >= 3000000, id, "")),
                 vjust = 1.5, hjust = 1.5, check_overlap = TRUE) +
  scale_size_continuous(range = c(1, 10)) +
  scale_color_manual(values = c("1" = "blue", "2" = "green", "3" = "red", "4" = "yellow", "5" = "purple", "6" = "orange", "7" = "pink", "8" = "cyan")) +  # Customize based on the number of clusters
  guides(edge_alpha = guide_legend(title = "Edge Alpha"),
         size = guide_legend(title = "Betweenness Centrality"),
         color = guide_legend(title = "Cluster")) +
  theme_graph() +
  theme(legend.position = "bottom") +
  labs(title = "Enhanced Company Network Visualization",
       subtitle = "Key Nodes Highlighted with Betweenness Centrality >= 3,000,000")
Warning in geom_edge_link(aes(width =
edge_betweenness_sum/max(edge_betweenness_sum), : Ignoring unknown aesthetics:
tooltip
Warning in geom_node_point(aes(size = betweenness_centrality, color = cluster,
: Ignoring unknown aesthetics: tooltip
# Convert ggraph plot to plotly object
p_plotly <- ggplotly(p, tooltip = "tooltip")
Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomEdgePath() has yet to be implemented in plotly.
  If you'd like to see this geom implemented,
  Please open an issue with your example code at
  https://github.com/ropensci/plotly/issues
# Display the plotly interactive plot
p_plotly

Q3

By analyzing the ownership structure, we tracked changes in most influential individuals (VIP) networks over time, identifying key individuals with increasing influence.

Part 1: Data Wrangling

Split the nodes into people and companies, and filter ownership-related edges

# Select crucial columns and fill missing values where appropriate
cleaned_nodes <- mc3_data[["nodes"]] %>%
  select(id, type, country, HeadOfOrg, revenue,ProductServices,PointOfContact,founding_date,TradeDescription,dob,
         `_last_edited_by`, `_last_edited_date`, `_date_added`, `_raw_source`, `_algorithm`) %>%
  mutate(HeadOfOrg = ifelse(is.na(HeadOfOrg), "Unknown", HeadOfOrg),
         revenue = ifelse(is.na(revenue), 0, revenue))

# Handle missing values in links
# Select crucial columns and fill missing values where appropriate
cleaned_links <- mc3_data[["links"]] %>%
  select(key,source, target, type, start_date, end_date, `_last_edited_by`, `_last_edited_date`, `_date_added`, `_raw_source`, `_algorithm`) %>%
  mutate(start_date = ifelse(is.na(start_date), "Unknown", start_date),
         end_date = ifelse(is.na(end_date), "Unknown", end_date))

# Ensure proper data types
cleaned_nodes <- cleaned_nodes %>%
  mutate(
    id = as.character(id),
    type = as.character(type),
    country = as.character(country),
    HeadOfOrg = as.character(HeadOfOrg),
    revenue = as.numeric(revenue),
    `_last_edited_by` = as.character(`_last_edited_by`),
    `_last_edited_date` = as.character(`_last_edited_date`),
    `_date_added` = as.character(`_date_added`),
    `_raw_source` = as.character(`_raw_source`),
    `_algorithm` = as.character(`_algorithm`)
  )

cleaned_links <- cleaned_links %>%
 mutate(
    source = as.character(source),
    target = as.character(target),
    type = as.character(type),
    start_date = as.character(start_date),
    end_date = as.character(end_date),
    `_last_edited_by` = as.character(`_last_edited_by`),
    `_last_edited_date` = as.character(`_last_edited_date`),
    `_date_added` = as.character(`_date_added`),
    `_raw_source` = as.character(`_raw_source`),
    `_algorithm` = as.character(`_algorithm`)
  )

Check for data types

# Ensure correct data types for nodes
cleaned_nodes <- cleaned_nodes %>%
  mutate(
    id = as.character(id),
    type = as.character(type),
    country = as.character(country),
    HeadOfOrg = as.character(HeadOfOrg),
    revenue = as.numeric(revenue),
      dob = as.POSIXct(dob, format="%Y-%m-%dT%H:%M:%S"),
    `_last_edited_by` = as.character(`_last_edited_by`),
    `_last_edited_date` = as.POSIXct(`_last_edited_date`, format="%Y-%m-%dT%H:%M:%S"),
    founding_date=as.POSIXct(founding_date, format="%Y-%m-%dT%H:%M:%S"),
    `_date_added` = as.POSIXct(`_date_added`, format="%Y-%m-%dT%H:%M:%S"),
    `_raw_source` = as.character(`_raw_source`),
    `_algorithm` = as.character(`_algorithm`)
    
  )

# Ensure correct data types for links
cleaned_links <- cleaned_links %>%
 mutate(
    source = as.character(source),
    target = as.character(target),
    type = as.character(type),
    start_date = as.POSIXct(start_date, format="%Y-%m-%dT%H:%M:%S"),
    end_date = as.POSIXct(end_date, format="%Y-%m-%dT%H:%M:%S"),
    `_last_edited_by` = as.character(`_last_edited_by`),
    `_last_edited_date` = as.POSIXct(`_last_edited_date`, format="%Y-%m-%dT%H:%M:%S"),
    `_date_added` = as.POSIXct(`_date_added`, format="%Y-%m-%dT%H:%M:%S"),
    `_raw_source` = as.character(`_raw_source`),
    `_algorithm` = as.character(`_algorithm`)
  )

# Print cleaned data for inspection
glimpse(cleaned_nodes)
Rows: 60,520
Columns: 15
$ id                  <chr> "Abbott, Mcbride and Edwards", "Abbott-Gomez", "Ab…
$ type                <chr> "Entity.Organization.Company", "Entity.Organizatio…
$ country             <chr> "Uziland", "Mawalara", "Uzifrica", "Islavaragon", …
$ HeadOfOrg           <chr> "Émilie-Susan Benoit", "Honoré Lemoine", "Jules La…
$ revenue             <dbl> 5994.73, 71766.67, 0.00, 0.00, 4746.67, 46566.67, …
$ ProductServices     <chr> "Unknown", "Furniture and home accessories", "Food…
$ PointOfContact      <chr> "Rebecca Lewis", "Michael Lopez", "Steven Robertso…
$ founding_date       <dttm> 1954-04-24, 2009-06-12, 2029-12-15, 1972-02-16, 1…
$ TradeDescription    <chr> "Unknown", "Abbott-Gomez is a leading manufacturer…
$ dob                 <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `_last_edited_by`   <chr> "Pelagia Alethea Mordoch", "Pelagia Alethea Mordoc…
$ `_last_edited_date` <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2…
$ `_date_added`       <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2…
$ `_raw_source`       <chr> "Existing Corporate Structure Data", "Existing Cor…
$ `_algorithm`        <chr> "Automatic Import", "Automatic Import", "Automatic…
glimpse(cleaned_links)
Rows: 75,817
Columns: 11
$ key                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ source              <chr> "Avery Inc", "Berger-Hayes", "Bowers Group", "Bowm…
$ target              <chr> "Allen, Nichols and Thompson", "Jensen, Morris and…
$ type                <chr> "Event.Owns.Shareholdership", "Event.Owns.Sharehol…
$ start_date          <dttm> 2016-10-29, 2035-06-03, 2028-11-20, 2024-09-04, 2…
$ end_date            <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `_last_edited_by`   <chr> "Pelagia Alethea Mordoch", "Niklaus Oberon", "Pela…
$ `_last_edited_date` <dttm> 2035-01-01, 2035-07-15, 2035-01-01, 2035-01-01, 2…
$ `_date_added`       <dttm> 2035-01-01, 2035-07-15, 2035-01-01, 2035-01-01, 2…
$ `_raw_source`       <chr> "Existing Corporate Structure Data", "Oceanus Corp…
$ `_algorithm`        <chr> "Automatic Import", "Manual Entry", "Automatic Imp…

Changing field name

cleaned_nodes <- cleaned_nodes %>%
  rename("last_edited_by" = "_last_edited_by",
         "date_added" = "_date_added",
         "last_edited_date" = "_last_edited_date",
         "raw_source" = "_raw_source",
         "algorithm" = "_algorithm") 

cleaned_links<- cleaned_links %>%
  rename("last_edited_by" = "_last_edited_by",
         "date_added" = "_date_added",
         "last_edited_date" = "_last_edited_date",
         "raw_source" = "_raw_source",
         "algorithm" = "_algorithm") 

Split ‘type’ column into separate columns

We are going to tidy the type column by creating two columns “entity2,entity3”.

word_list1 <- strsplit(cleaned_nodes$type, "\\.")
max_elements1 <- max(lengths(word_list1))
word_list_padded1 <- lapply(word_list1, 
function(x) c(x, rep(NA, max_elements1 - length(x))))
word_df1 <- do.call(rbind, word_list_padded1)
colnames(word_df1) <- paste0("entity", 1:max_elements1)
word_df1 <- as_tibble(word_df1) %>%
  select(entity2, entity3)
class(word_df1)
[1] "tbl_df"     "tbl"        "data.frame"

The steps below will be used to split text in type column into two columns

word_list <- strsplit(cleaned_links$type, "\\.")
max_elements <- max(lengths(word_list))
word_list_padded <- lapply(word_list, 
function(x) c(x, rep(NA, max_elements - length(x))))
word_df <- do.call(rbind, word_list_padded)
colnames(word_df) <- paste0("event", 1:max_elements)
word_df <- as_tibble(word_df) %>%
  select(event2, event3)
class(word_df)
[1] "tbl_df"     "tbl"        "data.frame"

Since the output above is a matrix, the code chunk above is used to convert word_df into a tibble data.frame.

cleaned_nodes <- cleaned_nodes %>%
  cbind(word_df1)
cleaned_links <- cleaned_links %>%
  cbind(word_df)

The code chunk above appends the extracted columns back to edges tibble data.frame.

write_rds(cleaned_nodes, "data/rds/cleaned_nodes.rds")
write_rds(cleaned_links, "data/rds/cleaned_links.rds")

above code write into R rds file format.

Part 1: Data Wrangling

Split the nodes into people and companies, and filter ownership-related edges

# Split the nodes into people and companies
nodes_people <- cleaned_nodes %>% filter(entity2 == "Person")
nodes_company <- cleaned_nodes %>% filter(entity2 == "Organization")
# Filter the links to include only ownership-related edges
links_owns <- cleaned_links %>% filter(event2 == "Owns")
nodes_people <- nodes_people %>%
  rowwise() %>%
  mutate('no_owns' = sum(links_owns$source == id))

nodes_people$no_owns <- as.numeric(nodes_people$no_owns)
# Calculate the unique counts of 'no_owns' and their corresponding counts and percentages
owns_summary <- nodes_people %>%
  group_by(no_owns) %>%
  summarise(count = n()) %>%
  mutate(percentage = (count / sum(count)) * 100)

# Display the summary
print(owns_summary)
# A tibble: 19 × 3
   no_owns count percentage
     <dbl> <int>      <dbl>
 1       0   147    0.285  
 2       1 46370   89.8    
 3       2  4032    7.81   
 4       3   665    1.29   
 5       4   245    0.474  
 6       5    80    0.155  
 7       6    34    0.0658 
 8       7    21    0.0407 
 9       8    11    0.0213 
10       9     7    0.0136 
11      10     2    0.00387
12      11     4    0.00774
13      12     3    0.00581
14      13     2    0.00387
15      15     1    0.00194
16      18     2    0.00387
17      29     1    0.00194
18      91    18    0.0349 
19      92     4    0.00774

To define and identify influential people based on an ownership threshold. It filters the nodes to keep only those with a significant number of ownerships

# Define the threshold for 'influential'
vip_threshold <- 91

# Filter to keep only influential people and select relevant columns
vip <- nodes_people %>%
  filter(no_owns >= vip_threshold) %>%
  select(id, country, dob, last_edited_date, date_added, no_owns)

# Display the updated vip data frame
glimpse(vip)
Rows: 22
Columns: 6
Rowwise: 
$ id               <chr> "Kelsey Ortega", "Joseph Gentry", "Cynthia Anderson",…
$ country          <chr> "n.a.", "n.a.", "n.a.", "n.a.", "n.a.", "n.a.", "n.a.…
$ dob              <dttm> 1974-11-26, 1980-11-08, 1991-07-23, 2013-10-03, 1981…
$ last_edited_date <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2035…
$ date_added       <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2035…
$ no_owns          <dbl> 91, 91, 91, 92, 91, 92, 91, 91, 91, 91, 91, 91, 91, 9…

filter the ownership connections to include only those involving these influential individuals

# Filter links_owns to keep only those connections where the source is in the vip list
vip_connections <- links_owns %>%
  filter(source %in% vip$id)%>%
  select(source, target,start_date,end_date,last_edited_date, date_added)

# Display the updated vip_connections data frame
glimpse(vip_connections)
Rows: 2,006
Columns: 6
$ source           <chr> "Kelsey Ortega", "Kelsey Ortega", "Kelsey Ortega", "K…
$ target           <chr> "Mitchell-Glover", "Anderson, Smith and Weber", "Orr …
$ start_date       <dttm> 2017-08-11, 2028-12-13, 2016-09-18, 2034-12-16, 2032…
$ end_date         <dttm> NA, NA, NA, NA, 2035-07-13, NA, NA, NA, NA, NA, NA, …
$ last_edited_date <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2035…
$ date_added       <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2035…

Part 1: Network Graph

Finally, plot the network, highlighting the structure and connections of influential individuals.

# Create graph from VIP connections
g_vip <- graph_from_data_frame(d = vip_connections, directed = TRUE)

# Identify VIPs and Companies
V(g_vip)$type <- ifelse(V(g_vip)$name %in% nodes_people$id, "VIP", "Company")

# Define colors and sizes
V(g_vip)$color <- ifelse(V(g_vip)$type == "VIP", "blue", "orange")
V(g_vip)$size <- ifelse(V(g_vip)$type == "VIP", 8, 5)

# Plot the network
plot(g_vip, vertex.label = NA, vertex.size = V(g_vip)$size, edge.arrow.size = 0.5, 
     vertex.color = V(g_vip)$color, main = "VIP Connections Network")

The plot represents the VIP Connections Network, with blue nodes indicating influential VIPs and orange nodes representing companies they own. Directed edges illustrate ownership, pointing from VIPs to companies. This visualization highlights the dense centrality of VIPs, showcasing their extensive control across multiple companies. By examining these connections, we can infer the structure and extent of VIP influence within the network and help FishEye identify influential individuals within the business network, highlighting ownership structures and central figures. By tracking ownership changes over time, FishEye can pinpoint who controls companies involved in illegal fishing activities.

While this plot provides a static snapshot, in the following we shall create similar plots for different time periods can reveal changes in ownership and influence over time.

Part 2: Temporal Analysis

Aggregate Ownership Changes by Year

change_over_time1 <- links_owns %>%
  group_by(start_date) %>%
  summarize(count = n()) %>%
  drop_na()

links_owns<- links_owns %>%
  mutate(start_year = format(start_date, "%Y"))

# Aggregate ownership changes by year
change_over_time <- links_owns %>%
  group_by(start_year) %>%
  summarize(count = n()) %>%
  drop_na()

Create plots to visualize the changes in ownership over time.

# Plot changes over time
ggplot(change_over_time, aes(x = as.numeric(start_year), y = count)) +
  geom_line() +
  geom_point() +
  labs(title = "Changes in Ownership Over Time",
       x = "Year",
       y = "Number of Ownership Changes") +
  theme_minimal()

Part 2: Network Graph by Year

Given the significant increase in data from 2004 onwards, focusing on every 10 years from 2005 to 2035 would provide a more detailed analysis of changes in ownership and influence.

# Specify the year 
filter_year <- 2005

# Filter vip_connections by start_year
vip_connections_filtered <- vip_connections %>%
  filter(format(start_date, "%Y") == filter_year)

# Create the graph object from the filtered vip_connections
g_vip_filtered <- graph_from_data_frame(d = vip_connections_filtered, directed = TRUE)

# Identify VIPs (nodes_people) and Companies
V(g_vip_filtered)$type <- ifelse(V(g_vip_filtered)$name %in% nodes_people$id, "VIP", "Company")

# Define colors and sizes
V(g_vip_filtered)$color <- ifelse(V(g_vip_filtered)$type == "VIP", "blue", "orange")
V(g_vip_filtered)$size <- ifelse(V(g_vip_filtered)$type == "VIP", 8, 5)

# Plot the network
p2005<-plot(g_vip_filtered, vertex.label = NA, vertex.size = V(g_vip_filtered)$size, edge.arrow.size = 0.5, 
     vertex.color = V(g_vip_filtered)$color, main = paste("VIP Connections Network for", filter_year))

In 2005, the network shows a relatively sparse structure with a moderate number of connections. VIPs (blue nodes) are moderately interconnected, indicating a balanced distribution of influence among several key players.

# Specify the year
filter_year <- 2015

# Filter vip_connections by start_year
vip_connections_filtered_2015 <- vip_connections %>%
  filter(format(start_date, "%Y") == filter_year)

# Create the graph object from the filtered vip_connections
g_vip_filtered_2015 <- graph_from_data_frame(d = vip_connections_filtered_2015, directed = TRUE)

# Identify VIPs (nodes_people) and Companies
V(g_vip_filtered_2015)$type <- ifelse(V(g_vip_filtered_2015)$name %in% nodes_people$id, "VIP", "Company")

# Define colors and sizes
V(g_vip_filtered_2015)$color <- ifelse(V(g_vip_filtered_2015)$type == "VIP", "blue", "orange")
V(g_vip_filtered_2015)$size <- ifelse(V(g_vip_filtered_2015)$type == "VIP", 8, 5)

# Plot the network
p2015 <- plot(g_vip_filtered_2015, vertex.label = NA, vertex.size = V(g_vip_filtered_2015)$size, edge.arrow.size = 0.5, 
     vertex.color = V(g_vip_filtered_2015)$color, main = paste("VIP Connections Network for", filter_year))

Question 4

For part 1, the focus was on identifying the network associated with SouthSeafood Express Corp and visualizing how this network and competing businesses changed as a result of their illegal fishing behavior.

Part 1: Identify SouthSeafood Express Corp Node

  • Locate the node representing SouthSeafood Express Corp in the network.

  • Create a visualization of the network associated with SouthSeafood Express Corp before any changes.

# Extract edges connected to SouthSeafood Express Corp
southseafood_edges <- cleaned_links %>%
  filter(source == "SouthSeafood Express Corp" | target == "SouthSeafood Express Corp")%>%
  select(source,target,start_date,end_date,event2)

# Ensure all nodes in the edge list are present in the vertex data frame
southseafood_nodes <- cleaned_nodes %>%
  filter(id %in% (c(southseafood_edges$source, southseafood_edges$target)))

# Join edges with nodes to ensure all nodes are present
southseafood_edges <- southseafood_edges %>%
  filter(source %in% southseafood_nodes$id & target %in% southseafood_nodes$id)

# Create graph object for the sub-network
g_southseafood <- graph_from_data_frame(d = southseafood_edges, vertices = southseafood_nodes, directed = TRUE)

# Visualize the initial network
plot(g_southseafood, vertex.label = NA, vertex.size = 5, edge.arrow.size = 0.5, 
     vertex.color = "orange", main = "Network Associated with SouthSeafood Express Corp")

Part 1: Identify Competing Businesses

Identify and highlight competing businesses within the extracted sub-network.

competing_businesses <- cleaned_nodes %>%
  filter(entity3 == "FishingCompany" & id != "SouthSeafood Express Corp")
competing_edges <- cleaned_links %>%
  filter(source %in% competing_businesses$id | target %in% competing_businesses$id) %>%
  select(source, target, start_date, end_date, event2)

# Combine SouthSeafood Express Corp edges with competing businesses edges
combined_edges <- bind_rows(southseafood_edges, competing_edges)

# Extract the combined set of nodes
combined_nodes <- cleaned_nodes %>%
  filter(id %in% c(combined_edges$source, combined_edges$target))
# Create graph object for the combined network
g_combined <- graph_from_data_frame(d = combined_edges, vertices = combined_nodes, directed = TRUE)

Part 1: Analyze Temporal Changes based on start_year

  • Filter the data to show the network before and after the illegal fishing incident(assume the incident happened in 2023)

  • Create visualizations to compare the network structure and connections before and after the incident.

# Assume the accident happened in 2023
incident_year <- 2023

# Filter edges before the incident
edges_before <- combined_edges %>%
  filter(format(start_date, "%Y") < incident_year)

# Filter edges after the incident
edges_after <- combined_edges %>%
  filter(format(start_date, "%Y") >= incident_year)

# Create graph objects for before and after the incident
g_before <- graph_from_data_frame(d = edges_before, vertices = combined_nodes, directed = TRUE)
g_after <- graph_from_data_frame(d = edges_after, vertices = combined_nodes, directed = TRUE)

Part 1: Visualize the Temporal Changes

Identify and highlight significant changes in connections and structure due to the illegal fishing behavior and subsequent closure.

par(mfrow = c(2, 1))

plot_before <- ggraph(g_before, layout = "fr") +
  geom_edge_link(aes(edge_alpha = 0.8), show.legend = FALSE, color = "gray", width = 1) +
  geom_node_point(aes(color = ifelse(name == "SouthSeafood Express Corp", "SouthSeafood", 
                                     ifelse(type == "Entity.Organization.FishingCompany", "FishingCompany", "Other"))), 
                  size = 3, alpha = 0.6, show.legend = TRUE) + # Adjusted alpha for transparency
  scale_color_manual(values = c("SouthSeafood" = "red", "FishingCompany" = "blue", "Other" = "orange"),
                     name = "Type") + # Shortened legend title
  theme_void() +
  theme(legend.position = "bottom") +
  labs(title = "Network Before Incident")

# Show the plot for the network before the incident
plot_before

plot_after <- ggraph(g_after, layout = "fr") +
  geom_edge_link(aes(edge_alpha = 0.8), show.legend = FALSE, color = "gray", width = 1) +
  geom_node_point(aes(color = ifelse(name == "SouthSeafood Express Corp", "SouthSeafood", 
                                     ifelse(type == "Entity.Organization.FishingCompany", "FishingCompany", "Other"))), 
                  size = 3, alpha = 0.6, show.legend = TRUE) + # Adjusted alpha for transparency
  scale_color_manual(values = c("SouthSeafood" = "red", "FishingCompany" = "blue", "Other" = "orange"),
                     name = "Type") + # Shortened legend title
  theme_void() +
  theme(legend.position = "bottom") +
  labs(title = "Network After Incident")

# Show the plot for the network after the incident
plot_after

Note

Observations:

  • The number of blue nodes (fishing companies) appears to have decreased.

  • SouthSeafood Express Corp (red node) remains central but its connections might have changed, indicating possible impact from the incident.

For part 2, since we cannot use revenue data over time, we will focus on identifying which companies potentially benefited from SouthSeafood Express Corp’s legal troubles by analyzing changes in network centrality measures.

Part 2: Calculate Centrality Measures Before and After the Incident

# Calculate degree centrality before the incident
degree_before <- degree(g_before, mode = "all")

# Calculate degree centrality after the incident
degree_after <- degree(g_after, mode = "all")

# Combine degree centrality measures into a data frame
centrality_change <- data.frame(
  id = names(degree_before),
  degree_before = degree_before,
  degree_after = degree_after
)

# Calculate the change in degree centrality
centrality_change <- centrality_change %>%
  mutate(change = degree_after - degree_before)

# Display companies with the most positive change in degree centrality
top_beneficiaries <- centrality_change %>%
  arrange(desc(change)) %>%
  head(10)

print(top_beneficiaries)
                                           id degree_before degree_after change
Anderson-Roberts             Anderson-Roberts             0           36     36
Hall, Hartman and Hall Hall, Hartman and Hall             0           30     30
Kirk Inc                             Kirk Inc             0           18     18
Watson-Gray                       Watson-Gray             0           18     18
Parker Inc                         Parker Inc             0           17     17
Mullins-Carrillo             Mullins-Carrillo             0           15     15
Torres, Ross and Brown Torres, Ross and Brown             0           14     14
Byrd and Sons                   Byrd and Sons             0           13     13
Haynes-Lucero                   Haynes-Lucero             0           13     13
Lutz-Fleming                     Lutz-Fleming             0           13     13

Part 2: Determine Entity Type

# Merge with cleaned_nodes to get the entity type
top_beneficiaries_info <- top_beneficiaries %>%
  left_join(cleaned_nodes, by = c("id" = "id")) %>%
  select(id, change,entity3)

# Display the entity type of top beneficiaries
print(top_beneficiaries_info)
                       id change        entity3
1        Anderson-Roberts     36 FishingCompany
2  Hall, Hartman and Hall     30 FishingCompany
3                Kirk Inc     18 FishingCompany
4             Watson-Gray     18 FishingCompany
5              Parker Inc     17 FishingCompany
6        Mullins-Carrillo     15 FishingCompany
7  Torres, Ross and Brown     14 FishingCompany
8           Byrd and Sons     13 FishingCompany
9           Haynes-Lucero     13 FishingCompany
10           Lutz-Fleming     13 FishingCompany

Part 2: Visualize the Changes

# Bar plot of top beneficiaries
ggplot(top_beneficiaries_info, aes(x = reorder(id, change), y = change)) +
  geom_bar(stat = "identity", fill = "lightblue") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Top Beneficiaries by Change in Degree Centrality",
       x = "Company",
       y = "Change in Degree Centrality",
       fill = "Entity Type") +
  theme(legend.position = "none")

The results show that the top beneficiaries, all classified as fishing companies, significantly increased their network centrality following SouthSeafood Express Corp’s legal troubles. Anderson-Roberts, Hall, Hartman and Hall, and Kirk Inc., among others, saw the largest gains, suggesting they capitalized on the shift in the network’s structure.

By 2015, the network has grown denser, suggesting increased interconnectedness and influence consolidation. More VIPs are connected to multiple companies (orange nodes), indicating a significant rise in their influence and control over the network.

# Specify the year
filter_year <- 2025

# Filter vip_connections by start_year
vip_connections_filtered_2025 <- vip_connections %>%
  filter(format(start_date, "%Y") == filter_year)

# Create the graph object from the filtered vip_connections
g_vip_filtered_2025 <- graph_from_data_frame(d = vip_connections_filtered_2025, directed = TRUE)

# Identify VIPs (nodes_people) and Companies
V(g_vip_filtered_2025)$type <- ifelse(V(g_vip_filtered_2025)$name %in% nodes_people$id, "VIP", "Company")

# Define colors and sizes
V(g_vip_filtered_2025)$color <- ifelse(V(g_vip_filtered_2025)$type == "VIP", "blue", "orange")
V(g_vip_filtered_2025)$size <- ifelse(V(g_vip_filtered_2025)$type == "VIP", 8, 5)

# Plot the network
p2025 <- plot(g_vip_filtered_2025, vertex.label = NA, vertex.size = V(g_vip_filtered_2025)$size, edge.arrow.size = 0.5, 
     vertex.color = V(g_vip_filtered_2025)$color, main = paste("VIP Connections Network for", filter_year))

The network continues to expand in 2025, displaying even more complexity and interconnections. This period likely represents a peak in influence for several VIPs, with many of them owning shares in numerous companies, suggesting increased market control.

# Specify the year
filter_year <- 2035

# Filter vip_connections by start_year
vip_connections_filtered_2035 <- vip_connections %>%
  filter(format(start_date, "%Y") == filter_year)

# Create the graph object from the filtered vip_connections
g_vip_filtered_2035 <- graph_from_data_frame(d = vip_connections_filtered_2035, directed = TRUE)

# Identify VIPs (nodes_people) and Companies
V(g_vip_filtered_2035)$type <- ifelse(V(g_vip_filtered_2035)$name %in% nodes_people$id, "VIP", "Company")

# Define colors and sizes
V(g_vip_filtered_2035)$color <- ifelse(V(g_vip_filtered_2035)$type == "VIP", "blue", "orange")
V(g_vip_filtered_2035)$size <- ifelse(V(g_vip_filtered_2035)$type == "VIP", 8, 5)

# Plot the network
p2035 <- plot(g_vip_filtered_2035, vertex.label = NA, vertex.size = V(g_vip_filtered_2035)$size, edge.arrow.size = 0.5, 
     vertex.color = V(g_vip_filtered_2035)$color, main = paste("VIP Connections Network for", filter_year))

In 2035, the network structure shifts to a star-like formation, where a central VIP appears to have gained substantial influence, with direct connections to numerous companies. This indicates a significant consolidation of power and influence, where a few key players dominate the network.

Caution

Initially, influence is distributed among several key players, but over the years, it becomes concentrated among fewer individuals, leading to a highly centralized network by 2035. This centralization of power can be both an opportunity for streamlined decision-making and a risk for monopolistic control. Monitoring these changes is crucial for regulatory bodies like FishEye to ensure fair practices and prevent illegal activities within the network.